System and Method for Exploring 3D Scenes by Pointing at a Reference Object

ABSTRACT

The system and method are described for enhancing location-based services by enabling spatial database systems to respond to or answer spatial queries that use a reference object to identify objects or features of interest in environmental scene before a system user. The system and method present invention enhances pointing technology by permitting system users to use queries to identify objects or features within the system user&#39;s field of view by pointing at the reference object or feature, and linking it to the object of interest by using spatial prepositions.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.12/645,243 entitled “System and Method for Exploring 3D Scenes byPointing at a Reference Object,” now U.S. Pat. No. 8,745,090, whichclaims the benefit under 35 U.S.C. §119(e) to U.S. ProvisionalApplication No. 61/139,900, filed Dec. 22, 2008 entitled “System andMethod for Exploring 3D Scenes by Pointing at Reference Object,” bothincorporated by reference herein in their entirety.

FIELD OF INVENTION

The present invention relates generally to computer-based systems andmethods for exploring environmental scenes represented in acomputer-based information system and referring to data or initiatingactions associated with individual objects within and beyond suchenvironmental scene. More specifically, the present invention relates tocomputer-based distributed systems and methods for exploring a spatialscene and manipulating the objects within the environmental scene by (i)directly pointing at the objects or (ii) indirectly by pointing at anyvisible objects as a reference and using this reference to select theobject or objects of interest.

BACKGROUND OF THE INVENTION

Pointing at spatial objects and features with a handheld device toretrieve information about that object or feature stored in acomputer-based system database is becoming increasingly popularpredominantly in domains such as location-based services (LBS), gaming,or augmented reality. The current generation of handheld devices, suchas cellular phones and personal digital assistants (PDAs), include allthe components and sensors required to derive the system user's positionand direction of pointing, as well as the modules for wirelesscommunications with computer-based information system databases.

Sensor readings acquired by this current generation of handheld devicescould be transmitted to certain services or databases, such as spatialdatabases. In the case of spatial databases, this data will be used toidentify and retrieve from such databases information about the objector feature to which the handheld device is currently being pointed. Theinformation that is retrieved will be processed by the wirelesslyconnected computer-based system and transmitted back to the handhelddevice, where it will be used to answer a system user's question suchas, “What is that object (or feature) over there?”.

The set of questions that may be answered using the information that isretrieved from databases described above based on the pointing-atapproach has been restricted to properties and attributes related to oneexplicit object. More specifically, the retrieved information isrestricted to a single object at which the handheld device is beingpointed. Noting this, answers to questions such as “What is left of thatbuilding over there?” or “Are there any hospitals behind that buildingover there?” or even “What is to the North of that object over there?”are not considered to be included in the retrieved information.

It would be highly desirable to have a system and method that would beable to answer questions relating to a scene presented to a system userthat would address more than just the one object or feature beingpointed at. A system and method that provides such capabilities wouldalso increase the usability of pointing devices. Further, a system andmethod that had these capabilities would enhance the interaction betweena system user and an expert system (e.g., GIS) relating tolocation-based services, from which not only information about singleobjects may be obtained, but also uses a reference object or feature toidentify other objects, for example, having the capability to define andidentify “The building to the left of that building!”. It would also bedesirable to have a system and method that could be used to initiate anaction or launch a service associated with the object or feature simplyby directly or indirectly pointing at the reference object or feature inorder to identify the object of interest.

The present invention provides a system and method to overcome theproblems of pointing systems of the past.

SUMMARY OF THE INVENTION

The present invention is a system and method that enhanceslocation-based services by enabling computer-based spatial databasesystems to respond to or answer spatial queries that use a referenceobject to identify objects or features of interest in the environmentalscene presented to a system user. The present invention enhancespointing technology by providing a system and method that allows systemusers to use queries to identify objects or features within the systemuser's field of view by pointing at the reference object or feature, andlinking it to the object of interest by using spatial prepositions.

According to the present invention, a system user will point at anobject or feature in his/her environment, such as within the systemuser's visual field, for the purpose of identifying the object orfeature. A system user can then use that object or feature as referencefor requesting additional information or initiating actions associatedwith other objects located in the vicinity of the object or feature thatwas first identified. This would be done by pointing at the referenceobject or feature instead of performing the conventional methodspecifically requesting information or initiating action associated witheach object or feature in the system user's environment, such as withinthe system user's visual field.

The system of the present invention includes a scene generator thatdescribes the spatial scene as perceived by the system user in terms ofthe spatial configuration associated with the environment. According tothe present invention, “spatial configuration” is understood to includean absolute frame of reference, i.e., North, South, East, West;egocentric frame of reference, i.e., described from the system user'spoint of view; and intrinsic frame of reference, i.e., based on theobject's or feature's structure. Such meta-level scene descriptions willdefine the semantics of the scene and allow interaction with objects orfeatures surrounding the reference object or feature to which thepointing device is pointing. Further, by describing the visual scenepresented to the system user according to the present invention, thisscene description also may be used for initiating simple queries aboutthe objects or features located near the reference object or feature.Such a scene description will also permit initiating highly complexqueries, such as finding buildings with a specific spatial relation tothe reference object or feature.

The scene description according to the present invention will supportsystem user interaction with visual objects or features and, by way ofindirect reference, also interaction with object or features that arehidden by other objects or features. Examples of such interactions areinitiating a service related to the objects or features, such as areminder service, or performing actions like switching lights on or off.

Preferably, the present invention permits deriving a systemuser-centered spatial frame of reference, the egocentric frame ofreference. Such an egocentric frame of reference may be used toqualitatively describe the configuration of the spatial scene asperceived by the system user. A scene description of this type willprovide the spatial relations between objects or features within andbeyond what is visually seen by the system user. The egocentric frame ofreference will also form the basis for answering questions related toobjects or clusters of objects that surround the reference object orfeature.

Mobile devices, principally handheld devices, that include the novelfeatures of the present invention may be adapted for wirelesscommunication with a virtual system or system database on a systemserver that contains stored representations of objects and features inthe system user's environment. Mobile devices incorporating the presentinvention may also include sensors to determine the spatial relationbetween the system user and the object or feature being pointed at bythe system user using the mobile device. This spatial relation, or axisbetween the system user and the object or feature, may be used toquantitatively and qualitatively describe the location of objects orfeatures surrounding the object or feature at which the user ispointing. More specifically, the spatial relationship may be used toquantitatively and qualitatively describe the location of surroundingobjects or features in terms of their position with respect to the axisestablished between system user and the reference object or feature.This description may include annotations for an absolute frame ofreference, annotations for the egocentric frame of reference, as well asannotations with respect to the object's or feature's intrinsic frame ofreference as it relates to the reference object or feature.

According to the present invention, annotating the environmental sceneas perceived by system users preferably will establish the foundationfor answering questions such as “What is the building to the left ofthat building?” or “Is there a church behind that building?”. Theseannotations also will provide the foundation for initiating actions orservices associated with these buildings, such as “Switch on the lightof the building behind that building!”

The present invention will be described in greater detail in theremainder of the specification referring to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of a representative system for carrying outthe present invention.

FIG. 2 shows a workflow for carrying out an exemplary method of thepresent invention.

FIG. 3 shows a representation of the query regions associated with thesystem and method of the present invention.

FIG. 4A shows an absolute frame of reference according to the system andmethod of the present invention.

FIG. 4B shows an egocentric frame of reference according to the systemand method of the present invention.

FIG. 4C shows an intrinsic frame of reference according to the systemand method of the present invention.

FIG. 5 shows a representative example of a visual scene which systemusers may use for a projective scene description according to a systemand method of the present invention.

FIG. 6 shows an example of a set of projected topical relations derivedaccording to the system and method of the present invention.

DETAILED DESCRIPTION OF THE DRAWINGS

The present invention is a system and method for enhancinglocation-based services by enabling computer-based spatial databasesystems to respond to or answer spatial queries that use a referenceobject to identify objects or features of interest in the environmentalscene surrounding a system user. The present invention is applied topointing technology in such a way that system users are allowed to usequeries to identify objects or features within the system user's fieldof view by pointing at the reference object or feature, and thenassociating the object or feature of interest by using spatialprepositions with respect to the reference object or feature.

FIG. 1, generally at 100, shows a general overview of the system of thepresent invention. FIG. 1 also shows the general information data flowamong the elements of the system of the present invention. FIG. 2,generally at 200, shows a workflow according to the present invention,which will be described in greater detail subsequently.

Before discussing FIG. 1 in detail, it is understood that the system ofthe present invention includes a system client that provides a systemuser with his/her current position, the direction of pointing, as wellas a query interface and a module for wireless communication with asystem server. The system server of the present invention also includessystem databases (not shown) that store representations of the systemuser's environment; a scene generator that generates a scene asperceived by the system user; and a scene graph annotator thatquantitatively describes the scene and provides the input for answeringthe system user's queries. FIG. 1 will now be described in detail.

Referring to FIG. 1, generally at 100, system 104 of the presentinvention is shown in relation to the real-world 102. The real-world 102represents a visual scene that would be viewed by the system user.System 104 includes system client 106, system server 108, and systemstorage, preferably in the form of system databases (not shown), forstoring at least information relating to at least the environmentalscenes that will be experienced by the system user. As shown at 102,system user 124 will be located in environment 122 and will see visualscene 120, which is part of environment 122. Visual scene 120 includesone or more spatial objects 126 that may be pointed at by system user124.

As stated, system 104 includes system client 106, system server 108, andsystem storage (not shown). System user 124 interacts directly withsystem client 106 that includes Query Interface 132, and Positioning andPointing Modules 134. System user 124 principally interacts with systemclient 106 at Query Interface 132. When System user 124 points systemclient 106 at spatial object 126, Positioning and Pointing Modules 134can determine the position of the system user and the direction ofpointing, and supply this information to Query Interface 132 for furtherprocessing as part of a query statement.

More specifically with respect to system client 106, it includes (i)Query Interface 132 that is used by the system user for the formulationof queries, browsing results, and receiving and displaying information;(ii) a sensor for deriving the system user's current position within anabsolute three-dimensional (3-D) frame of reference, e.g., WGS84 forGPS; and (iii) a sensor for deriving direction as a two-valued vectorrepresenting yaw and pitch, i.e., azimuth and elevation angle ingeodetic terms, whereby (ii) and (iii) are part of Positioning andPointing Modules 134. Further, system client 106 includes acommunication module that receives the query formulation from QueryInterface 132 and prepares it for submission to system server 108, whereit is processed.

System server 108 receives query statements including the system user'slocation and pointing direction with respect to environment 122 in whichsystem user 124 is located. System server 108 at 136 contains a 3-Drepresentation of environment 122, in which the object or feature ofinterest is determined based on the location and pointing direction ofsystem client 106. As stated, the system user's geographic locationinformation may be provided to system server 108, by, for example, a GPSsensor associated with system client 106. Other methods of determiningthe environmental scene may be used and still be within the scope of thepresent invention.

As stated, 3-D Representation of Environment 136 and Query Processor 138communicate with Scene Generator 140 for determining the object that thesystem user is pointing at and for producing the 3-D scene graph. 3-DRepresentation of Environment 136 communicates scene information toScene Generator 140 and Query Processor 138 communicates queryinformation generated by the system user to Scene Generator 140,including location information and pointing direction. Scene Generator140 produces a scene graph that corresponds to the scene in the systemuser's field of view. Scene Generator 1450 also adds information aboutobjects or features beyond the visual objects to the scene graph. SceneGraph Annotator 144 adds qualitative attributes about the spatialconfiguration of the scene perceived by the system user. Thematic Linker142 is for linking generic information about objects or features to thescene graph. Examples of such generic information includes, but is notlimited to, publicly available tax information about buildings, historyof monuments, etc.

Referring to system server 108, system user 124 defines a querystatement using Query Interface 132 and the communication module (notshown) of system client 106 sends this statement to Query Processor 138.Query Processor 138 uses Scene Generator 140 to generate a visual scenefrom 3-D Representation of Environment 136 as perceived by system user124. This scene is used for identifying the spatial objects or featuresof interest in the system user's visual scene 120. Scene Generator 140further enhances this scene with objects that are hidden to system user124. Query Processor 138 processes of the received queries and employsScene Generator 140 and Thermatic Linker 142. Specifically, QueryProcessor 138 transmits system user location and pointing direction toScene Generator 140 and requests for specific information about objectsor features in the system user's environment (visible or invisible) toThematic Linker 142.

Scene Generator 140 provides a visual scene and the identity of theobject or feature that the system user pointed at to Scene GraphAnnotator 144. Scene Graph Annotator 144 carries out its above-describedactions by adding information about the absolute, egocentric, andintrinsic setup of the scene. Scene Graph Annotator 144 sends the fullydescribed scene to Thematic Linker 142. Thematic Linker 142 enhances thescene description based upon inputs from Query Processor 138 and SceneGraph Annotator 144, and transmits the answer to the query defined bysystem user 124 back to Query Interface 132.

It is understood that each of the elements shown in system server 108may be separate modules or integrated into one or more modules and stillbe within the scope of the present invention. It is also understood thatthe elements of client server 116 may be separate modules or integratedinto one or more modules and still be within the scope of the presentinvention.

Referring to FIG. 2, generally 200, a workflow according to the presentinvention will be described. In general, a workflow according to thepresent invention will include (i) capturing position, azimuth, andelevation angle of the pointing direction of the system client, andsubmitting the captured information to the system server, (ii)generating a two-dimensional (2-D) scene representation of the visualscene presented to the system user from the 3-D representation on thesystem server and identifying a reference object, (iii) generating ascene graph using the system user's position and a reference object,(iv) enhancing the scene graph with qualitative annotations fortopographical relations, absolute references, egocentric references, andintrinsic references, and (v) transmitting the scene graph back to thesystem client for exploration and interaction with spatial objects bythe system user.

Again referring to FIG. 2, at 202, an input query statement is generatedby system user 124 using Query Interface 132 of system client 106. Queryinterface 132 provides the system user with means to define the termsand conditions of the query statement. Query statements are set forth asa combination of the system user's location and pointing direction fromwhich a reference object or feature is determined in Scene Generator140, a term defining the spatial relation between the reference objectand the object or objects of interest, and an expression that indicatesthe topic of interest or action to be performed. In regard to the latterreferred to expression, an example of such a topic of interest would behistoric buildings that the system user may be interested in or anaction such as launching a bookmarking service associated with theobject or feature of interest.

Noting the preceding with regard to the elements of a query statement,if a system user wanted to know if there is a hospital behind a buildingthat is within his/her field of view, an example of a statement would befollowing:

“Are there any HOSPITALs BEHIND that BUILDING over THERE?”

According to the query statement example just provided, HOSPITALsrepresents the object or feature of interest and its concept, BEHIND, isthe spatial relation between the object of interest and the referenceobject. “BUILDING over THERE” is the reference object that establishesthe axis between the system user and reference object, which is added tothe query statement by pointing at the reference object, i.e., aslocation and pointing direction. The remaining part of the querystatement, the system user's location, would be determined based on theposition of the system client using, for example, a GPS sensorassociated with the system client. A preferable form of a querystatement will include at least three components: queryTopic,spatialPreposition, and Location (HERE, THERE, SCENE). An example of apossible query statement is shown below in BNF (“Backus Naur Form”)notation:

<queryStatement> ::= <queryTopic> <spatialPreposition> { <HERE_Location>| <THERE_Location> | <SCENE> } <queryTopic> ::= “What is” | “Are thereany” <concept> | “Is” <instance> <concept> ::= type of object Note: Thisis input by the user <instance> ::= name of object Note: This is inputby the user <spatialPreposition> ::= <absolute> | <egocentric> |<intrinsic> <absolute> ::= “North” | “East” | “South” | “West” | “NE” |“SE” | “SW” | “NW” <egocentric> ::= { <egoLatPrep> | <egoLongPrep> |<egoVertPrep> } “of” <egoLatPrep> ::= “left” | “right” <egoLongPrep> ::=“in front” | “behind” <egoVertPrep> ::= “above” | “below” <intrinsic>::= “in” | “left” | “right” | “behind” | “in front” | “on top” |“underneath” <HERE_Location> ::= User location, pitch, yaw, radius<THERE_Location> ::= User location, pitch, yaw, radius Note: Used foridentifying reference object <SCENE> ::= User location, pitch, yaw,visual-angle [, maxDistance]

As stated, the exemplary query statement includes three components: thequeryTopic, spatialPreposition, and the Location (HERE/THERE/SCENE). Thequery topic consists of a beginning phrase such as “What is,” and thetype and name of the object of interest. The spatial preposition willinvolve the absolute, egocentric, and intrinsic frames of references.The HERE/THERE/SCENE will involve the system user's location (HERELocation), the reference object's location (THERE Location), and thesystem user's view frustum, the SCENE. The HERE and THERE locations willbe defined in pitch, yaw, and radius, and the SCENE will be defined inpitch, yaw, and visual angle. Depending on the configuration of thequery statement, one or multiple spatial objects may be referenced.Further, the view frustum is not required to filter occluded objects butrather annotates them and therefore includes all objects within aspecified region for purposes of identification.

It is understood that the query statement provided above is by way ofexample and not limitation. Other forms of the query statement arecontemplated and therefore are within the scope of the presentinvention. Further, the definitions that have been provided with respectto the components of the query statement also are provided by way ofexample and other definitions may be used and still be within the scopeof the present invention.

Referring to FIG. 3, generally at 300, the regions associated with thecomponent HERE/THERE/SCENE will be described in greater detail.Referring to FIG. 3, the three query regions are HERE Location 302,THERE Location 304, and view frustum 313. HERE Location 302 correspondsto a query region of conventional location-based services (LBS).However, the query performed within the HERE Location includesdirectional information provided by directional sensors. As such, thequery supports egocentric queries such as “What is to my left?” or “Whatis to my right?” or “Are there any hospitals in front of me?”. Thesequeries are not contemplated by conventional LBS.

Preferably, system user 306 is located in the center of HERE Location302. Radius r_(Here) 308 will define the size of the HERE Location.However, it is understood that other shapes may define the HERE Locationand still be within the scope of the present invention.

In FIG. 3, system user 306 is pointing system client 106 in direction318 at reference object 312 located at THERE Location 304. As such,THERE Location 304 is derived from the system user 306's current HERELocation position and pointing direction 318. Thus, reference object 312located at THERE Location 304 will have its position derived from theHERE Location position and pointing direction 318. The position of thereference object will be used to establish the egocentric frame ofreference that will be discussed subsequently.

Preferably, THERE Location 304 includes reference object 312 at thecenter and the size of the THERE Location will be defined by radiusr_(There) 310. However is understood that other shapes and sizes ofTHERE Location 304 and position may be possible and still be within thescope of the present invention.

The egocentric frame of reference is defined by the relationship betweensystem user 306 and reference object 312. The understanding of thisrelationship permits the present invention to address objects within thesystem user's field of view as being left or right, in front or behind,above or below the reference object. Queries involving the THERELocation will return objects located within the region defined by thatarea.

The SCENE is defined by view frustum 313. View frustum 313 is bounded byrays 314 and 316, and curved line segment 322 determined by radiusr_(Scene) 320. For purposes of object identification, view frustum 313will include all objects that are within the visual angle defined byrays at 314 and 316 and optionally bounded by curved line segment 322.Preferably, view frustum 313 is understood to represent the systemuser's perceived field of view, including hidden or obscured objectswithin the visual angle. The view frustum also can be used to derive theradial distance for the THERE Location. For example, the THERE locationof a query statement may extend from the center of the object or featureof interest 312 to the bounds of the view frustum.

In FIG. 2, it indicates that spatial prepositions are provided as partof the query statement generated by the system user at 202. The spatialprepositions the system user generates are based on the absolute frameof reference (FIG. 4A), egocentric frame of reference (FIG. 4B), andintrinsic frame of reference (FIG. 4C). These frames of reference willdefine a spatial relationship between a query topic and query location.For example, in the exemplary query statement provided previously,HOSPITAL would be the query topic, the location of the building that isdirectly being pointed at by the system user would be the object ofreference from which results are derived, and the area from BEHIND thereference object to the bounds of the SCENE would be the query location.

As stated, the query statement links the topic of the query to the querylocation by means of a spatial preposition. For purposes ofunderstanding the present invention and not of limitation, a querystatement may be “Is there a hospital North of the building?”. In thisstatement, the query topic, namely the search for a hospital, is welldefined in a spatial relationship with the reference object, thebuilding that is being pointed at. In this case, the query locationwould be “North of the building.” The type of spatial preposition willdepend on the frame of reference in which the system user addressesobjects. These frames of reference include, but are limited to,absolute, egocentric, and intrinsic frames of reference. Although onlythree frames of reference have been described, it is understood thatmore or less than three may be used and still be within the scope of thepresent invention.

Referring to FIG. 4A, generally at 400, an absolute frame of referenceaccording to the present invention will be described. Absolute frame ofreference 400 describes the spatial relationship between system user306, reference object 312, and an object of interest according to thequery statement in terms of the global environment represented bycompass rose 401. The spatial relationship also will be dependent onpointing direction 310. For example, North, South, East, and West areabsolute spatial references for defining a query statement.

In FIG. 4A, there are eight reference areas that may be used if theabsolute frame of reference is used for defining the position of anobject of interest. The object of interest may be located at North area402, Northeast area 404, Northwest area 406, East area 408, West area410, South area 412, Southeast area 414, or Southwest area 416. However,it is understood that there may be more or less than eight areas fordefining the position of an object of interest and still be within thescope of the present invention.

The use of absolute frames of reference will depend on the type of spacein which the system user is located. As an example, geographic space maybe described by cardinal directions for outdoor areas which may not beconducive for use in describing the absolute frame of reference forvirtual or indoor environments.

Referring to FIG. 4B, generally at 440, an egocentric frame of referenceaccording to the present invention will be described. Egocentricreference frame 440 describes spatial relationships from system user306's point of view, or from another person's point of view, incircumstances when the system user is taking another person's position.For example, using an egocentric frame of reference, the query statementmay be “the monument is in front of the building.” This query statementwould mean the monument is in front of the building between system user306 and building 312. The spatial relationship also will depend onpointing direction 318.

In FIG. 4B, there are eight reference areas that may be used if theegocentric frame of reference is used for defining the position of anobject of interest. These are defined in part by rays 442 and 444emanating from system user 306. The object of interest may be located atIn Front area 446, In Front-Left area 448, In Front-Right area 450,Besides-Left area 452, Besides-Right area 454, Behind area 456,Behind-Left area 458, or Behind-Right area 460. However, it isunderstood that there may be more or less than eight areas for definingthe position of an object of interest and still be within the scope ofthe present invention.

Referring to FIG. 4C, generally at 470, an intrinsic frame of referenceaccording to the present invention will be described. Intrinsicreference frame 470 describes spatial relationships in terms of thestructural peculiarities of objects in a scene. For example, using anintrinsic frame of reference, the query statement may be “the monumentin front of the building.” This query statement in intrinsic terms wouldmean the monument is located directly in front of building 312regardless of the position of the monument relative to system user 306.The spatial relationship also will be dependent on pointing direction318.

In FIG. 4C, there are eight reference areas that may be used if theintrinsic frame of reference is applied for defining the position of anobject of interest. The object of interest may be located at In Frontarea 472, In Front-Left area 474, In Front-Right area 476, Left area478, Right area 480, Behind area 482, Behind-Left area 484, orBehind-Right area 486. However, it is understood that there may be moreor less than eight areas for defining the position of an object ofinterest and still be within the scope of the present invention.

Is understood that the phraseology for spatial prepositions can be otherthan what is represented in FIGS. 4A, 4B, and 4C and still be within thescope of the present invention.

Referring to FIG. 2, once the query statement has been generated andtransmitted from system client 106 to Query Processor 138, the identityof the reference object is determined at 204 based on the transmittedinformation as it applies to 3-D representation of the environment 136in system server 108. Following the identification of the referencedobject, the present invention moves to step 206 to generate a scenegraph using the reference object.

Referring to FIG. 1, Query Processor 138 will parse the query statementinto location information for transmission to Scene Generator 140 andcontent filtering information for transmission to Thematic Linker 142.The parsed input to Scene Generator 140 results in a description of thescene surrounding the system user and information about the object orfeature of reference being transmitted from Scene Generator 140 to SceneGraph Annotator 144. The scene graph that is generated by SceneGenerator 140 is derived from the system user's position and pointingdirection information. Preferably, the generated scene is an extendedcomputational representation of the perspective scene used by systemuser 124 for spatial reasoning. The generated scene will include, butnot be limited to, all of the objects (visible and occluded) in viewfrustum 313 (FIG. 3) of the system user along with the relative spatialconfiguration. The relative spatial configuration, preferably, will beexpressed as binary spatial relations reflecting absolute spatialprepositions between the single objects and intrinsic spatialprepositions for individual objects. For example, the absolute binaryspatial relation between two objects may be expressed as “Object A isNorth of Object B” or “Object A is adjacent of Object B”.

The scene graph generated at 206 in FIG. 2 is the input to Scene GraphAnnotator 144. At 208 in FIG. 2, Scene Graph Annotator 144 addsegocentric spatial prepositions to the scene graph. The egocentricspatial prepositions will describe the position of the object ofinterest with respect to the system user's position relative to thereference object. It is further understood that absolute and intrinsicspatial prepositions may be calculated a priori and do not need to bederived in the scene graph annotator. Preferably, the spatialprepositions present in the annotated scene graph at Scene GraphAnnotator 144 will correspond to the set of spatial prepositionsavailable in Query Interface 132.

It is understood that Screen Graph Annotator 144 will enhance the scenegraph. Although enhancing the scene graph with absolute information at212, egocentric information at 214, and intrinsic information 216 hasjust been described, it is understood that the scene graph may beenhanced by topographical information at 210 and still be within thescope of the present invention.

Once the scene graph is fully annotated at Scene Graphic Annotator 144,the annotated scene graph may be used to answer the system user's query.If the system user's query is a thematic query, such as a query forrestaurants behind the object or feature of reference, Scene GraphAnnotator 144 will send a filtering request for restaurants to ThematicLinker 142.

At 218 in FIG. 2, Thematic Linker 142 identifies the object or set ofobjects of interest, e.g., the building behind the building beingpointed at or the hospitals behind the object being pointed at, and linkor filter thematic data relating to the query topic to that object orset. An example of the thematic data being linked may be menus linked torestaurants or videos about historic monuments being linked to them.After linking according to step 218 takes place, the annotated scenegraph, along with the thematic content is returned to system client 106,where it is presented to system user 124 using Query Interface 132.

As stated, Scene Graph Annotator 144 adds spatial prepositions to thescene graph produced by Scene Generator 140. These annotations arequalitative statements about the position of objects with respect to theaxis defined by the system user's position and the position of thereference object being pointed at. Qualitative statements describe theconfiguration in terms of natural language statements, rather thanmetric statements, and are, therefore, simpler to understand by systemusers. An example of a quantitative statement that describes theconfiguration is the spatial preposition LEFT, as in building A is LEFTof building B.

FIG. 5, generally at 500, describes projective scene description 502.According to FIG. 5, the set of qualitative statements is comprised of(i) a set for lateral and vertical prepositions and (ii) a set ofprepositions to reflect the depth of the perceived scene. As shown at503 in FIG. 5, a three-coordinate system is shown, which includesvertical at 504, depth at 505, and lateral at 507. The statements forlateral, vertical, and depth descriptions of the scene are derived inpart from the work by Papadias and Egenhofer (1992), titled“Hierarchical Reasoning about Direction Relations”, which isincorporated herein by reference. This reference is directed to aset-theoretic approach to projective topological relations among spatialobjects, which results in the 169 distinct relations among two objectsin the projective plane. This will be described in detail at FIG. 6.

Again referring to FIG. 5, generally at 500, the relationship of visualscene 508 and real-world 518 will be described. For identifying anobject or feature, such as a reference object 510, system user 506 willpoint system client 106 (FIG. 1) at reference object 510 using pointingdirection 516. Projective scene description 502 is based on athree-coordinate system 503. Based on a three-coordinate system 503,system user 506 will have a 2-D description of visual scene 508 from the3-D representation of real-world view 518 for defining spatiallyseparated building 510 and silo 512. In this scene, building 510 wouldbe the reference object and silo 512 could be defined based on itsposition from reference object 510.

FIG. 6, generally at 600, provides an example of a set of projectedtopical relations derived according to the present invention that usesin part what is described in Papadias and Egenhofer (1992).Specifically, the present invention derives a set of relations from thescene graph and maps the individual configurations to qualitativestatements, such as “right,” “left above,” or “behind.” According toFIG. 6, at 602, object of interest 606 would be to the left of referenceobject 604; at 608, object of interest 606 would be to the right of areference object 604; and at 610, object of interest 606 would be abovereference object 604. Depth statements cannot be derived from thetopological approach used by Papadias and Egenhofer since configurationslike “behind-left” are not reflected in the set of relations. Thepresent invention adds at least these additional depth statements thatwill define the distance between the system user and reference object inorder to derive additional depth statements for objects or object ofinterest that do not overlap with the reference object that are in thevisual scene.

It is understood that the elements of the systems of the presentinvention may be connected electronically by wired or wirelessconnections and still be within the scope of the present invention.

The embodiments or portions thereof of the system and method of thepresent invention may be implemented in computer hardware, firmware,and/or computer programs executing on programmable computers or serversthat each includes a processor and a storage medium readable by theprocessor (including volatile and non-volatile memory and/or storageelements). Any computer program may be implemented in a high-levelprocedural or object-oriented programming language to communicate withinand outside of computer-based systems.

Any computer program may be stored on an article of manufacture, such asa storage medium (e.g., CD-ROM, hard disk, or magnetic diskette) ordevice (e.g., computer peripheral), that is readable by a general orspecial purpose programmable computer for configuring and operating thecomputer when the storage medium or device is read by the computer toperform the functions of the embodiments. The embodiments or portionsthereof, may also be implemented as a machine-readable storage medium,configured with a computer program, where, upon execution, instructionsin the computer program cause a machine to operate to perform thefunctions of the embodiments described above.

The embodiments or portions thereof, of the system and method of thepresent invention described above may be used in a variety ofapplications. Although the embodiments, or portions thereof, are notlimited in this respect, the embodiments, or portions thereof, may beimplemented with memory devices in microcontrollers, general purposemicroprocessors, digital signal processors (DSPs), reducedinstruction-set computing (RISC), and complex instruction-set computing(CISC), among other electronic components. Moreover, the embodiments, orportions thereof, described above may also be implemented usingintegrated circuit blocks referred to as main memory, cache memory, orother types of memory that store electronic instructions to be executedby a microprocessor or store data that may be used in arithmeticoperations.

The descriptions are applicable in any computing or processingenvironment. The embodiments, or portions thereof, may be implemented inhardware, software, or a combination of the two. For example, theembodiments, or portions thereof, may be implemented using circuitry,such as one or more of programmable logic (e.g., an ASIC), logic gates,a processor, and a memory.

Various modifications to the disclosed embodiments will be apparent tothose skilled in the art, and the general principals set forth below maybe applied to other embodiments and applications. Thus, the presentinvention is not intended to be limited to the embodiments shown ordescribed herein.

1. A computer-implemented method for identifying objects of interest inan environmental scene based on a reference object, comprising the stepsof: (A) pointing a pointing device at an object within a system user'svisual field within the environmental scene, with the pointing devicebeing capable of determining its geographic location and identifying theobject to which the pointing device is pointing in a system user'svisual field in a system user's environment; (B) generating a spatialscene with a scene generator that coincides with the system user'svisual field, with the spatial scene including a spatial configurationof the system user's environment; (C) designating the object identifiedin step (A) as a reference object; (D) determining semantics of thespatial scene, with the semantics geographically associating objectswithin the system user's visual field with the reference objectdesignated at step (C); (E) generating queries relating to one or moreobjects of interest within a system user's visual field based onpointing at the reference object designated at step (C); and (F)identifying one or more objects of interest within the system user'svisual field based on the queries relating to such one or more objectsof interest generated at step (E) based on pointing at the referenceobject designated at step (C).
 2. The method as recited in claim 1,wherein the spatial configuration at step (B) includes at least anabsolute frame of reference, egocentric frame of reference, and anintrinsic frame of reference.
 3. The method as recited in claim 2,wherein the egocentric frame of reference includes an object positionwithin the system user's visual field in the system user's environmentdescribed from a system user's geographic location.
 4. The method asrecited in claim 2, wherein the intrinsic frame of reference includes anobject position based on an object structure.
 5. The method as recitedin claim 1, wherein identifying one or more objects of interest withinthe system user's visual field based on queries relating to such one ormore objects of interest includes identifying objects of interest thatare hidden by other objects.
 6. The method as recited in claim 5,wherein a query includes statements that combine a system user'sgeographic location and pointing direction, a first expression defininga spatial relationship between the reference object and an object ofinterest, and a second expression that indicates a topic of interest oraction to be performed.
 7. The method as recited in claim 6, whereinquery regions associated with a spatial scene include a HERE region,THERE region, and SCENE region.
 8. The method as recited in claim 7,wherein the absolute frame of reference indicates a spatial relationshipbetween the system user, reference object, and the object of interestaccording to a query statement.
 9. The method as recited in claim 8,wherein the absolute frame of reference includes from one to eight areasfor defining the object of interest with respect to the referenceobject.
 10. The method as recited in claim 8, wherein the absolute frameof reference includes at least North, South, East, and West.
 11. Themethod as recited in claim 3, wherein the egocentric frame of referenceincludes from one to eight areas for defining the object of interestwith respect to the reference object.
 12. The method as recited in claim3, wherein the egocentric frame of reference includes being used forqualitatively describing the spatial configuration of the spatial scenewith qualitative statements.
 13. The method as recited in claim 2,wherein the intrinsic frame of reference includes from one to eightareas for defining the object of interest with respect to the referenceobject.
 14. The method as recited in claim 2, wherein the spatial sceneincludes a projective two-dimensional scene generated from athree-dimensional scene.
 15. The method as recited in claim 1, whereinthe pointing device includes a mobile device.
 16. A system forresponding to spatial queries about a real-world object of interestusing a reference object to identify the object of interest, comprising:a pointing device for pointing at a reference object in a real-worldscene and generating a query statement including a query topic andspatial preposition, relating to the object of interest according to aposition of the reference object, with the pointing device determining apointing device geodetic position and pointing direction, and with thepointing device communicating the query statement, and the pointingdevice geodetic position and pointing direction to a system server; anda system server further comprising, a mapping module for receiving andprocessing a projective three-dimensional representation of anenvironment that contains a real-world scene including the referenceobject and the object of interest, a scene generator that connects tothe mapping module and receives an output from the mapping module forgenerating a two-dimensional digital representation of the real-worldscene including the reference object and object of interest, andidentifies the reference object according to the pointing devicegeodetic position and pointing direction applied to the two-dimensionaldigital representation, a scene annotator for annotating thetwo-dimensional digital representation for identifying the object ofinterest according to the position of the reference object, and thequery topic and spatial preposition of the query statement by linkingthe query topic and spatial preposition to an object in thetwo-dimensional digital representation, and communicating to thepointing device the identification of the object of interest.
 17. Thesystem as recited in claim 16, wherein the pointing device includes amobile device.
 18. The system as recited in claim 16, wherein the sceneannotator includes using a egocentric frame of reference for identifyingthe object of interest.
 19. The system as recited in claim 18, whereinthe egocentric frame of reference identifies the object of interestaccording to the reference object position and pointing device geodeticposition.
 20. The system as recited in claim 15, wherein the systemfurther includes an identifier module for identifying the object ofinterest in the two-dimensional digital representation according to theposition of the reference object and the query topic and spatialpreposition of the query statement by linking the query topic andspatial preposition to an object in the two-dimensional digitalrepresentation, and communicating to the pointing device theidentification of the object of interest.